AITopics | convex loss

Collaborating Authors

convex loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

Sarkar, Dhruv, Sinha, Abhishek

arXiv.org Machine LearningMay-21-2026

We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.21107

Country: Asia > India (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)

Add feedback

Gradient Regularized Newton Boosting Trees with Global Convergence

Zozoulenko, Nikita, Falkowski, Daniel, Cass, Thomas, Gonon, Lukas

arXiv.org Machine LearningMay-4-2026

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

artificial intelligence, hessian, machine learning, (16 more...)

arXiv.org Machine Learning

2605.00581

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Add feedback

514a70448c235ccb8b6842ef5e02ad3b-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 21:41:50 GMT

artificial intelligence, machine learning, surrogate loss, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Convex Elicitation of Continuous Properties

Jessica Finocchiaro, Rafael Frongillo

Neural Information Processing SystemsFeb-15-2026, 03:00:30 GMT

Neural Information Processing Systems http://nips.cc/

convex elicitable, identification function, loss function, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

dececdcbf0ea0162234a8fb4ab051415-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:03:58 GMT

algorithm, convex loss, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

OnlineConvexOptimizationOverErd os-Rényi RandomNetworks

Neural Information Processing SystemsFeb-9-2026, 22:00:04 GMT

How the regret scales in a finite time horizonT with problem parameters is a central theme in studies of different algorithms.

artificial intelligence, machine learning, optimization, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Oceania > Australia (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.69)

Add feedback

514a70448c235ccb8b6842ef5e02ad3b-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 16:06:18 GMT

arxiv preprint arxiv, hypothesis, surrogate loss, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

18561617ca0b4ffa293166b3186e04b0-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 16:44:08 GMT

divergence, oisy -sgd, privacy amplification, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.67)
(2 more...)

Add feedback

Online Frank-Wolfe with Arbitrary Delays

Neural Information Processing SystemsDec-24-2025, 13:55:59 GMT

The online Frank-Wolfe (OFW) method has gained much popularity for online convex optimization due to its projection-free property. Previous studies show that OFW can attain an $O(T^{3/4})$ regret bound for convex losses and an $O(T^{2/3})$ regret bound for strongly convex losses. However, they assume that each gradient queried by OFW is revealed immediately, which may not hold in practice and limits the application of OFW. To address this limitation, we propose a delayed variant of OFW, which allows gradients to be delayed by arbitrary rounds. The main idea is to perform an update similar to OFW after receiving any delayed gradient, and play the latest decision for each round. Despite its simplicity, we prove that our delayed variant of OFW is able to achieve an $O(T^{3/4}+dT^{1/4})$ regret bound for convex losses and an $O(T^{2/3}+d\log T)$ regret bound for strongly convex losses, where $d$ is the maximum delay. This is quite surprising since under a relatively large amount of delay (e.g., $d=O(\sqrt{T})$ for convex losses and $d=O(T^{2/3}/\log T)$ for strongly convex losses), the delayed variant of OFW enjoys the same regret bound as that of the original OFW.

convex loss, name change, online frank-wolfe, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Online Convex Optimization Over Erdos-Renyi Random Networks

Neural Information Processing SystemsDec-24-2025, 11:25:44 GMT

The work studies how node-to-node communications over an Erd\H{o}s-R\'enyi random network influence distributed online convex optimization, which is vital in solving large-scale machine learning in antagonistic or changing environments. At per step, each node (computing unit) makes a local decision, experiences a loss evaluated with a convex function, and communicates the decision with other nodes over a network. The node-to-node communications are described by the Erd\H{o}s-R\'enyi rule, where independently each link takes place with a probability $p$ over a prescribed connected graph. The objective is to minimize the system-wide loss accumulated over a finite time horizon. We consider standard distributed gradient descents with full gradients, one-point bandits and two-points bandits for convex and strongly convex losses, respectively. We establish how the regret bounds scale with respect to time horizon $T$, network size $N$, decision dimension $d$, and an algebraic network connectivity. The regret bounds scaling with respect to $T$ match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problems, e.g., $\mathcal{O}(\sqrt{T}) $ and $\mathcal{O}(\ln(T)) $ regrets are established for convex and strongly convex losses with full gradient feedback and two-points information, respectively. For classical Erd\H{o}s-R\'enyi networks over all-to-all possible node communications, the regret scalings with respect to the probability $p$ are analytically established, based on which the tradeoff between the communication overhead and computation accuracy is clearly demonstrated. Numerical studies have validated the theoretical findings.

erdo-renyi random network, name change, online convex optimization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback